124 research outputs found
Illuminant Chromaticity Estimation from Interreflections
Reliable estimation of illuminant chromaticity is crucial for simulating
color constancy and for white balancing digital images. However, estimating
illuminant chromaticity from a single image is an ill-posed task, in general,
and existing solutions typically employ a variety of assumptions and
heuristics. In this paper, we present a new, physically-based, approach for
estimating illuminant chromaticity from interreflections of light between
diffuse surfaces. Our approach assumes that all of the direct illumination in
the scene has the same chromaticity, and that at least two areas where
interreflections between Lambertian surfaces occur may be detected in the
image. No further assumptions or restrictions on the illuminant chromaticty or
the shading in the scene are necessary. Our approach is based on representing
interreflections as lines in a special 2D color space, and the chromaticity of
the illuminant is estimated from the approximate intersection between two or
more such lines. Experimental results are reported on a dataset of illumination
and surface reflectance spectra, as well as on real images we captured. The
results indicate that our approach can yield state-of-the-art results when the
interreflections are significant enough to be captured by the camera
Evaluation and Comparison of Edge-Preserving Filters
Edge-preserving filters play an essential role in some of the most basic
tasks of computational photography, such as abstraction, tonemapping, detail
enhancement and texture removal, to name a few. The abundance and diversity of
smoothing operators, accompanied by a lack of methodology to evaluate output
quality and/or perform an unbiased comparison between them, could lead to
misunderstanding and potential misuse of such methods. This paper introduces a
systematic methodology for evaluating and comparing such operators and
demonstrates it on a diverse set of published edge-preserving filters.
Additionally, we present a common baseline along which a comparison of
different operators can be achieved and use it to determine equivalent
parameter mappings between methods. Finally, we suggest some guidelines for
objective comparison and evaluation of edge-preserving filters
Learning Character-Agnostic Motion for Motion Retargeting in 2D
Analyzing human motion is a challenging task with a wide variety of
applications in computer vision and in graphics. One such application, of
particular importance in computer animation, is the retargeting of motion from
one performer to another. While humans move in three dimensions, the vast
majority of human motions are captured using video, requiring 2D-to-3D pose and
camera recovery, before existing retargeting approaches may be applied. In this
paper, we present a new method for retargeting video-captured motion between
different human performers, without the need to explicitly reconstruct 3D poses
and/or camera parameters. In order to achieve our goal, we learn to extract,
directly from a video, a high-level latent motion representation, which is
invariant to the skeleton geometry and the camera view. Our key idea is to
train a deep neural network to decompose temporal sequences of 2D poses into
three components: motion, skeleton, and camera view-angle. Having extracted
such a representation, we are able to re-combine motion with novel skeletons
and camera views, and decode a retargeted temporal sequence, which we compare
to a ground truth from a synthetic dataset. We demonstrate that our framework
can be used to robustly extract human motion from videos, bypassing 3D
reconstruction, and outperforming existing retargeting methods, when applied to
videos in-the-wild. It also enables additional applications, such as
performance cloning, video-driven cartoons, and motion retrieval.Comment: SIGGRAPH 2019. arXiv admin note: text overlap with arXiv:1804.05653
by other author
CrossNet: Latent Cross-Consistency for Unpaired Image Translation
Recent GAN-based architectures have been able to deliver impressive
performance on the general task of image-to-image translation. In particular,
it was shown that a wide variety of image translation operators may be learned
from two image sets, containing images from two different domains, without
establishing an explicit pairing between the images. This was made possible by
introducing clever regularizers to overcome the under-constrained nature of the
unpaired translation problem. In this work, we introduce a novel architecture
for unpaired image translation, and explore several new regularizers enabled by
it. Specifically, our architecture comprises a pair of GANs, as well as a pair
of translators between their respective latent spaces. These cross-translators
enable us to impose several regularizing constraints on the learnt image
translation operator, collectively referred to as latent cross-consistency. Our
results show that our proposed architecture and latent cross-consistency
constraints are able to outperform the existing state-of-the-art on a variety
of image translation tasks
Unsupervised multi-modal Styled Content Generation
The emergence of deep generative models has recently enabled the automatic
generation of massive amounts of graphical content, both in 2D and in 3D.
Generative Adversarial Networks (GANs) and style control mechanisms, such as
Adaptive Instance Normalization (AdaIN), have proved particularly effective in
this context, culminating in the state-of-the-art StyleGAN architecture. While
such models are able to learn diverse distributions, provided a sufficiently
large training set, they are not well-suited for scenarios where the
distribution of the training data exhibits a multi-modal behavior. In such
cases, reshaping a uniform or normal distribution over the latent space into a
complex multi-modal distribution in the data domain is challenging, and the
generator might fail to sample the target distribution well. Furthermore,
existing unsupervised generative models are not able to control the mode of the
generated samples independently of the other visual attributes, despite the
fact that they are typically disentangled in the training data.
In this paper, we introduce UMMGAN, a novel architecture designed to better
model multi-modal distributions, in an unsupervised fashion. Building upon the
StyleGAN architecture, our network learns multiple modes, in a completely
unsupervised manner, and combines them using a set of learned weights. We
demonstrate that this approach is capable of effectively approximating a
complex distribution as a superposition of multiple simple ones. We further
show that UMMGAN effectively disentangles between modes and style, thereby
providing an independent degree of control over the generated content
Cross-Domain Cascaded Deep Feature Translation
In recent years we have witnessed tremendous progress in unpaired
image-to-image translation methods, propelled by the emergence of DNNs and
adversarial training strategies. However, most existing methods focus on
transfer of style and appearance, rather than on shape translation. The latter
task is challenging, due to its intricate non-local nature, which calls for
additional supervision. We mitigate this by descending the deep layers of a
pre-trained network, where the deep features contain more semantics, and
applying the translation from and between these deep features. Specifically, we
leverage VGG, which is a classification network, pre-trained with large-scale
semantic supervision. Our translation is performed in a cascaded,
deep-to-shallow, fashion, along the deep feature hierarchy: we first translate
between the deepest layers that encode the higher-level semantic content of the
image, proceeding to translate the shallower layers, conditioned on the deeper
ones. We show that our method is able to translate between different domains,
which exhibit significantly different shapes. We evaluate our method both
qualitatively and quantitatively and compare it to state-of-the-art
image-to-image translation methods. Our code and trained models will be made
available
Shape-Pose Disentanglement using SE(3)-equivariant Vector Neurons
We introduce an unsupervised technique for encoding point clouds into a
canonical shape representation, by disentangling shape and pose. Our encoder is
stable and consistent, meaning that the shape encoding is purely
pose-invariant, while the extracted rotation and translation are able to
semantically align different input shapes of the same class to a common
canonical pose. Specifically, we design an auto-encoder based on Vector Neuron
Networks, a rotation-equivariant neural network, whose layers we extend to
provide translation-equivariance in addition to rotation-equivariance only. The
resulting encoder produces pose-invariant shape encoding by construction,
enabling our approach to focus on learning a consistent canonical pose for a
class of objects. Quantitative and qualitative experiments validate the
superior stability and consistency of our approach
Neuron-level Selective Context Aggregation for Scene Segmentation
Contextual information provides important cues for disambiguating visually
similar pixels in scene segmentation. In this paper, we introduce a
neuron-level Selective Context Aggregation (SCA) module for scene segmentation,
comprised of a contextual dependency predictor and a context aggregation
operator. The dependency predictor is implicitly trained to infer contextual
dependencies between different image regions. The context aggregation operator
augments local representations with global context, which is aggregated
selectively at each neuron according to its on-the-fly predicted dependencies.
The proposed mechanism enables data-driven inference of contextual
dependencies, and facilitates context-aware feature learning. The proposed
method improves strong baselines built upon VGG16 on challenging scene
segmentation datasets, which demonstrates its effectiveness in modeling context
information
SAGNet:Structure-aware Generative Network for 3D-Shape Modeling
We present SAGNet, a structure-aware generative model for 3D shapes. Given a
set of segmented objects of a certain class, the geometry of their parts and
the pairwise relationships between them (the structure) are jointly learned and
embedded in a latent space by an autoencoder. The encoder intertwines the
geometry and structure features into a single latent code, while the decoder
disentangles the features and reconstructs the geometry and structure of the 3D
model. Our autoencoder consists of two branches, one for the structure and one
for the geometry. The key idea is that during the analysis, the two branches
exchange information between them, thereby learning the dependencies between
structure and geometry and encoding two augmented features, which are then
fused into a single latent code. This explicit intertwining of information
enables separately controlling the geometry and the structure of the generated
models. We evaluate the performance of our method and conduct an ablation
study. We explicitly show that encoding of shapes accounts for both
similarities in structure and geometry. A variety of quality results generated
by SAGNet are presented. The data and code are at
https://github.com/zhijieW-94/SAGNet.Comment: Accepted by SIGGRAPH 201
DiDA: Disentangled Synthesis for Domain Adaptation
Unsupervised domain adaptation aims at learning a shared model for two
related, but not identical, domains by leveraging supervision from a source
domain to an unsupervised target domain. A number of effective domain
adaptation approaches rely on the ability to extract discriminative, yet
domain-invariant, latent factors which are common to both domains. Extracting
latent commonality is also useful for disentanglement analysis, enabling
separation between the common and the domain-specific features of both domains.
In this paper, we present a method for boosting domain adaptation performance
by leveraging disentanglement analysis. The key idea is that by learning to
separately extract both the common and the domain-specific features, one can
synthesize more target domain data with supervision, thereby boosting the
domain adaptation performance. Better common feature extraction, in turn, helps
further improve the disentanglement analysis and disentangled synthesis. We
show that iterating between domain adaptation and disentanglement analysis can
consistently improve each other on several unsupervised domain adaptation
tasks, for various domain adaptation backbone models
- …